feat: enable multi-datacenter support by KAJdev · Pull Request #266 · runpod/flash

KAJdev · 2026-03-11T21:55:01Z

flash is hardcoded to EU-RO-1. The DataCenter enum only has one value, NetworkVolume.dataCenterId is frozen to it, and ServerlessResource.datacenter defaults to it with a prod-only guard that syncs it to locations.

DataCenter now has all 10 listed RunPod DCs. The datacenter parameter on both Endpoint and ServerlessResource accepts a single DC, a list, a string, or None (no location restriction, matching the console's default "all datacenters" behavior). The old default of EU-RO-1 is replaced with None.

Endpoint(volume=...) accepts a single NetworkVolume or a list. Network volumes are tied to individual DCs, so a one-volume-per-datacenter constraint is enforced in ServerlessResource.normalize_network_volumes. When multiple volumes are attached, the deploy path sends networkVolumeIds to the GraphQL API instead of the singular networkVolumeId.

CPU availability is limited to a subset of DCs. CPU_DATACENTERS (EU-RO-1, US-TX-1, EU-SE-1) is exported and validated against in validate_cpu_datacenters so you get a clear error

Closes AE-2422

Copilot

Pull request overview

Enables multi-datacenter deployments by expanding the DataCenter enum, allowing datacenter=None (all DCs) or lists, and adding first-class support for attaching multiple network volumes (one per datacenter) with corresponding GraphQL payload updates.

Changes:

Expand DataCenter to all supported RunPod DCs and default endpoint/resource datacenter behavior to None (no restriction / all DCs).
Add multi-volume support (accept single or list; validate one volume per DC; send networkVolumeIds when multiple).
Add CPU datacenter restrictions via exported CPU_DATACENTERS, with validation and clearer errors.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/test_p2_gaps.py	Updates expectations for default `locations/datacenter` behavior when unset.
tests/unit/test_endpoint.py	Adds coverage for volume normalization to lists and expanded datacenter input forms.
tests/unit/test_deprecations.py	Marks `CPU_DATACENTERS` as non-deprecated public API.
tests/unit/resources/test_serverless.py	Updates/extends tests for multi-DC datacenter normalization, locations syncing, multi-volume validation, and CPU DC validation.
tests/unit/resources/test_network_volume.py	Adds tests for `NetworkVolume(datacenter=...)` alias behavior.
src/runpod_flash/runtime/resource_provisioner.py	Passes `locations` from manifest into reconstructed resources.
src/runpod_flash/endpoint.py	Accepts volume as single or list; forwards `datacenter` as-is; serializes multi-volume configs into `networkVolumes`.
src/runpod_flash/core/resources/serverless_cpu.py	Excludes `networkVolumes` from CPU endpoint payload.
src/runpod_flash/core/resources/serverless.py	Adds multi-volume normalization/validation; datacenter normalization; CPU DC validation; deploy/update inject `networkVolumeIds`.
src/runpod_flash/core/resources/network_volume.py	Expands `DataCenter`; adds `from_string`; exports `CPU_DATACENTERS`; adds `datacenter` alias for `dataCenterId`.
src/runpod_flash/core/resources/load_balancer_sls_resource.py	Excludes `networkVolumes` from CPU load balancer payload.
src/runpod_flash/core/resources/init.py	Re-exports `CPU_DATACENTERS`.
src/runpod_flash/core/api/runpod.py	Extends `saveEndpoint` selection set to include `networkVolumeIds`.
src/runpod_flash/cli/commands/build_utils/manifest.py	Persists `locations` into the build manifest deployment config.
src/runpod_flash/init.py	Exposes `CPU_DATACENTERS` at the top-level package API.
docs/Flash_SDK_Reference.md	Documents expanded DC set, `datacenter=None`, `CPU_DATACENTERS`, and multi-volume examples.
docs/Flash_Deploy_Guide.md	Updates persistent storage examples for explicit DC targeting and multi-volume usage.

Comments suppressed due to low confidence (2)

src/runpod_flash/cli/commands/build_utils/manifest.py:242

Manifest extraction still only serializes the singular networkVolume (and networkVolumeId), so endpoints configured with multiple volumes (networkVolumes) will be written to the manifest with only the first volume. This will break deploy/runtime provisioning for multi-volume configs. Extend manifest extraction to include networkVolumes when present (and ensure runtime reconstruction supports it).

        if hasattr(resource_config, "locations") and resource_config.locations:
            config["locations"] = resource_config.locations

        if hasattr(resource_config, "env") and resource_config.env:
            env_dict = dict(resource_config.env)
            env_dict.pop("RUNPOD_API_KEY", None)
            if env_dict:
                config["env"] = env_dict

        if hasattr(resource_config, "networkVolume") and resource_config.networkVolume:
            nv = resource_config.networkVolume
            nv_config = {"name": nv.name}
            if nv.size is not None:
                nv_config["size"] = nv.size
            if hasattr(nv, "dataCenterId") and nv.dataCenterId is not None:
                nv_config["dataCenterId"] = (
                    nv.dataCenterId.value
                    if hasattr(nv.dataCenterId, "value")
                    else nv.dataCenterId
                )
            config["networkVolume"] = nv_config

src/runpod_flash/runtime/resource_provisioner.py:157

create_resource_from_manifest can reconstruct only networkVolume/networkVolumeId, but multi-volume endpoints are now represented via networkVolumes (and deployed via networkVolumeIds). Without handling those keys, manifest-based provisioning will silently drop additional volumes. Add support for networkVolumes (and/or networkVolumeIds) reconstruction here.

    if "locations" in resource_data:
        deployment_kwargs["locations"] = resource_data["locations"]

    # Reconstruct NetworkVolume from manifest data if present
    if "networkVolume" in resource_data:
        from runpod_flash.core.resources.network_volume import NetworkVolume

        nv_data = resource_data["networkVolume"]
        deployment_kwargs["networkVolume"] = NetworkVolume(**nv_data)
    elif "networkVolumeId" in resource_data:
        deployment_kwargs["networkVolumeId"] = resource_data["networkVolumeId"]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T01:16:51Z

src/runpod_flash/core/resources/serverless_cpu.py

    # CPU endpoints don't use GPU-specific fields, so exclude them from API payload
    # This prevents the RunPod GraphQL API from rejecting CPU endpoints with GPU-specific fields
    # Note: instanceIds is NOT in _input_only, so it will be sent to the API
    _input_only = {
        "id",
        "cudaVersions",  # GPU-specific, exclude from API payload
        "datacenter",
        "env",
        "gpus",  # Inherited from parent, but always None for CPU endpoints
        "gpuIds",  # GPU-specific API field, exclude from payload
        "gpuCount",  # GPU-specific API field, exclude from payload
        "allowedCudaVersions",  # GPU-specific API field, exclude from payload
        "flashboot",
        "flashEnvironmentId",
        "imageName",
        "networkVolume",
+        "networkVolumes",
        "python_version",
    }


CpuServerlessEndpoint now excludes networkVolumes from the API payload via _input_only, but the overridden config_hash below only includes networkVolume (singular). That means changing the multi-volume configuration won’t change config_hash, so drift detection/updates may not trigger. Include networkVolumes in the CPU config hash (or otherwise ensure multi-volume changes affect drift detection).

Copilot · 2026-03-12T01:16:52Z

src/runpod_flash/core/resources/load_balancer_sls_resource.py

    _input_only = {
        "id",
        "cudaVersions",
        "datacenter",
        "env",
        "flashboot",
        "flashEnvironmentId",
        "gpus",
        "gpuIds",
        "gpuCount",
        "allowedCudaVersions",
        "imageName",
        "networkVolume",
+        "networkVolumes",
        "python_version",
    }


CpuLoadBalancerSlsResource now excludes networkVolumes from the API payload via _input_only, but its custom config_hash implementation later in the class only hashes networkVolume (singular). Multi-volume changes may not be detected as drift, preventing needed updates. Include networkVolumes in the CPU LB config hash (or ensure multi-volume affects drift detection).

Copilot · 2026-03-12T01:16:52Z

docs/Flash_SDK_Reference.md

+from runpod_flash import NetworkVolume, DataCenter

+# existing volume by ID
 vol = NetworkVolume(id="vol_abc123")


The docs show vol = NetworkVolume(id="vol_abc123"), but NetworkVolume currently requires a name field (no default) in the model. Either update the example to include name=... when referencing an existing volume, or make name optional when id is provided.

Suggested change

vol = NetworkVolume(id="vol_abc123")

vol = NetworkVolume(id="vol_abc123", name="my-existing-volume")

Copilot · 2026-03-12T01:16:52Z

src/runpod_flash/core/resources/serverless.py

    async def _ensure_network_volume_deployed(self) -> None:
+        """Ensures all network volumes are deployed.
+
+        Deploys each volume in networkVolumes and collects their IDs.
+        Sets networkVolumeId (singular) for backward compat with the first volume.
+        Populates _deployed_volume_ids for multi-volume API payloads.
        """
-        Ensures network volume is deployed and ready if one is specified.
-        Updates networkVolumeId with the deployed volume ID.
-        """
+        self._deployed_volume_ids: list[str] = []
+
        if self.networkVolumeId:
-            return
+            self._deployed_volume_ids.append(self.networkVolumeId)



_ensure_network_volume_deployed assigns self._deployed_volume_ids on a Pydantic model without declaring it as a field/private attr. With the current BaseResource.model_config (no extra='allow'), setting an undeclared attribute will raise at runtime, breaking deploy/update. Define _deployed_volume_ids as a Pydantic PrivateAttr (or stash it via self.__dict__[...]) so it can be set safely.

Copilot · 2026-03-12T01:16:52Z

src/runpod_flash/endpoint.py

+    if isinstance(volume, NetworkVolume):
+        return [volume]
+    if isinstance(volume, list):
+        return volume or None


_normalize_volumes accepts a list but doesn’t validate that every element is a NetworkVolume. If a caller passes a mixed/incorrect list, the failure will happen later (e.g., when calling model_dump), producing a less clear error. Consider validating list element types here and raising a ValueError that points to the bad element.

Suggested change

return volume or None

if not volume:

# preserve existing behavior: empty list normalizes to None

return None

for idx, vol in enumerate(volume):

if not isinstance(vol, NetworkVolume):

raise ValueError(

"volume list elements must be NetworkVolume; "

f"element at index {idx} is {type(vol).__name__}"

)

return volume

runpod-Henrik

1. Multi-volume drift detection blind spot

_hashed_fields includes "networkVolumeId" (singular) but not "networkVolumes". After this PR, _ensure_network_volume_deployed sets networkVolumeId from the first volume only. Adding or removing a second volume doesn't change networkVolumeId → drift detection misses it → the endpoint doesn't redeploy.

2. Hash break for all existing endpoints

_hashed_fields includes "datacenter". Before this PR, datacenter stored a DataCenter enum (defaulting to EU_RO_1). After, it stores Optional[List[DataCenter]]. Any existing state file has the old enum representation hashed in. On first re-run after upgrade, every endpoint without an explicit datacenter will see a hash mismatch → forced redeploy. Worth calling out in the changelog — users will see unexpected redeployments without understanding why.

3. Testing gap

The validation logic (single vs. list normalization, CPU datacenter constraint, duplicate DC check, volume-DC mismatch) is well tested. But the async deploy path — the core of the new feature — has no coverage:

Path	Coverage
`_ensure_network_volume_deployed` with multiple volumes	None
`_deployed_volume_ids` population across mixed is_created / not-created volumes	None
`networkVolumeIds` payload injected into `_do_deploy`	None
`networkVolumeIds` payload injected into `update`	None

Everything from "volumes are validated at creation" to "GraphQL receives networkVolumeIds" is untested. That's the entire runtime path for the feature.

Nits

_normalize_volumes([]) in endpoint.py silently returns None instead of raising — a user passing an empty list gets no error.
The isinstance(self.datacenter, list) else [self.datacenter] branch in sync_input_fields is unreachable — normalize_datacenter always produces a list before any model validator runs.

Verdict

Two real issues worth fixing before merge: the drift detection gap (#1) and test coverage for the async deploy path (#3). The hash break (#2) is an unavoidable consequence of the design change but should be documented.

feat: enable multi-datacenter support

c01fa70

KAJdev force-pushed the zeke/ae-2422-enable-adding-dcs branch from c567dc6 to c01fa70 Compare March 11, 2026 22:04

KAJdev requested review from deanq and runpod-Henrik March 11, 2026 22:05

docs: update SDK reference and deploy guide for multi-datacenter support

985dff4

KAJdev mentioned this pull request Mar 11, 2026

feat: add datacenter selection example and update volume examples runpod/flash-examples#44

Open

KAJdev added 3 commits March 11, 2026 15:29

feat: add datacenter alias on NetworkVolume

47bf575

fix: pass locations through manifest and resource provisioner

0de5b01

fix: remove dead _is_prod_environment function

c45e4e2

deanq requested a review from Copilot March 12, 2026 01:10

Copilot started reviewing on behalf of deanq March 12, 2026 01:10 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

runpod-Henrik reviewed Mar 12, 2026

View reviewed changes

fix: address copilot review on multi-volume support

410adb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable multi-datacenter support#266

feat: enable multi-datacenter support#266
KAJdev wants to merge 6 commits intomainfrom
zeke/ae-2422-enable-adding-dcs

KAJdev commented Mar 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

runpod-Henrik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	vol = NetworkVolume(id="vol_abc123")
	vol = NetworkVolume(id="vol_abc123", name="my-existing-volume")

-        return volume or None
+        if not volume:
+            # preserve existing behavior: empty list normalizes to None
+            return None
+        for idx, vol in enumerate(volume):
+            if not isinstance(vol, NetworkVolume):
+                raise ValueError(
+                    "volume list elements must be NetworkVolume; "
+                    f"element at index {idx} is {type(vol).__name__}"
+                )
+        return volume

Conversation

KAJdev commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. Multi-volume drift detection blind spot

2. Hash break for all existing endpoints

3. Testing gap

Nits

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KAJdev commented Mar 11, 2026 •

edited

Loading